NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers

Qi, Daiqing; Zhao, Handong; Shi, Jing; Jenni, Simon; Fan, Yifei; Dernoncourt, Franck; Cohen, Scott; Li, Sheng (June 2025, The IEEE/CVF Conference on Computer Vision and Pattern Recognition)

Photographer, curator, and former director of photography at the Museum of Modern Art (MoMA), John Szarkowski remarked in *William Eggleston's Guide*, "While editing directly from life, photographers have found it too difficult to see simultaneously both the blue and the sky." Szarkowski insightfully revealed a notable gap between general and aesthetic visual understanding: while the former emphasizes identifying factual elements in an image (the sky), the latter transcends mere object identification, viewing it instead as an aesthetic component--a pure expanse of blue, valued purely as a color block in visual aesthetics. Such distinctions between general visual understanding (detection, localization, etc.) and aesthetic perception (color, lighting, composition, etc.) pose a significant challenge for existing Multimodal Large Language Models (MLLMs) in comprehending image aesthetics, which is increasingly needed in real-world applications, from image recommendation and enhancement to generation. To fundamentally advance the aesthetic understanding of MLLMs, we introduce a novel dataset, PhotoCritique, derived from extensive discussions among professional photographers and enthusiasts, distinguished by its large scale, expertise, and diversity. Additionally, we propose a new model, PhotoEye, an MLLM featuring a language-guided multi-view vision fusion mechanism for understanding image aesthetics from multiple perspectives. Finally, we introduce PhotoBench, a comprehensive and professional benchmark for aesthetic visual understanding. Our model demonstrates significant advantages over both open-source and commercial models on existing benchmarks and PhotoBench.
more » « less
Free, publicly-accessible full text available June 11, 2026
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding

Chen, Jian; Zhang, Ruiyi; Zhou, Yufan; Yu, Tong; Dernoncourt, Franck; Gu, Jiuxiang; Rossi, Ryan_A; Chen, Changyou; Sun, Tong (April 2025, The Thirteenth International Conference on Learning Representations)

Free, publicly-accessible full text available April 24, 2026
Demystifying the Power of Large Language Models in Graph Generation

https://doi.org/10.18653/v1/2025.findings-naacl.456

Wang, Yu; Rossi, Ryan A; Park, Namyong; Ahmed, Nesreen K; Koutra, Danai; Dernoncourt, Franck; Derr, Tyler (April 2025, Association for Computational Linguistics)

Free, publicly-accessible full text available April 1, 2026
ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning

https://doi.org/10.18653/v1/2024.emnlp-demo.24

Man, Hieu; Ngo, Nghia Trung; Dernoncourt, Franck; Nguyen, Thien Huu (November 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP 2024))

Full Text Available
A Large-scale Training Paradigm for Graph Generative Models

Wang, Yu; Rossi, Ryan; Park, Namyong; Chen, Huiyuan; Ahmed, Nesreen; Trivedi, Puja; Dernoncourt, Franck; Koutra, Danai; Derr, Tyler (January 2025, The Thirteenth International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available January 22, 2026
MCECR: A Novel Dataset for Multilingual Cross-Document Event Coreference Resolution

https://doi.org/10.18653/v1/2024.findings-naacl.245

Pouran_Ben_Veyseh, Amir; Lai, Viet; Nguyen, Chien; Dernoncourt, Franck; Nguyen, Thien Huu (June 2024, Findings of the Association for Computational Linguistics: NAACL 2024)

Full Text Available
Mastering Context-to-Label Representation Transformation for Event Causality Identification with Diffusion Models

Man, Hieu; Dernoncourt, Franck; Nguyen, Thien Huu (March 2024, Proceedings of the 38th AAAI Conference on Artificial Intelligence)
Hierarchical Selection of Important Context for Generative Event Causality Identification with Optimal Transports

Man, Hieu; Nguyen, Chien Van; Ngo, Nghia Trung; Ngo, Linh; Dernoncourt, Franck; Nguyen, Thien Huu (May 2024, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024))

Full Text Available
Editing Partially Observable Networks via Graph Diffusion Models

Trivedi, Puja; Rossi, Ryan A; Arbour, David; Yu, Tong; Dernoncourt, Franck; Kim, Sungchul; Lipka, Nedim; Park, Namyong; Ahmed, Nesreen K; Koutra, Danai (July 2024, International Conference on Machine Learning)

Full Text Available
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Nguyen, Thuat; Nguyen, Chien Van; Lai, Viet Dac; Man, Hieu; Ngo, Nghia Trung; Dernoncourt, Franck; Rossi, Ryan A; Nguyen, Thien Huu (May 2024, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024))

Full Text Available

« Prev Next »

Search for: All records